Spatial Models for Discrete Compositional Data

نویسندگان

  • Dean Billheimer
  • Peter Guttorp
چکیده

We develop a spatial model for compositional data based on Aitchison s model of multiplicative error structure for proportions An unobservable state proportion vec tor composition is assumed to exist for each sampled location The state compositions may be related through time or space and may also depend on covariates Discrete ob servations are accommodated by a conditional multinomial observation model given the state Markov chain Monte Carlo MCMC is used to provide inference about the species compositions covariates and temporal and spatial dependence parameters Graphical pre sentation of parameter estimates simplify model building and interpretation The methods developed include models for independent discrete compositions time series of discrete and continuous compositions and spatially related discrete and continuous compositions The methods are illustrated by an application evaluating the ecological condition of the Delaware Bay estuary via benthic invertebrate composition We gratefully acknowledge the nancial support of cooperative agreement CR between EPA and the University of Washington This manuscript has not been subjected to EPA s peer and policy review and does not necessarily re ect the the views of the agency Introduction The relative abundance of di erent species in a biological communtiy provides an example of compositions derived from discrete data Such data are widespread in biological studies e g ecological monitoring fossil palynology as well as market research see e g Allenby and Lenk JASA pp Data sampled from di erent spatial locations often exhibit spatial structure in the compositions Our approach to such problems is a state space model where each observation y j n is a k vector of counts of di erent groups We posit that each observation is a multinomial realization with the proportion parameter vector equal to an unobservable state composition z We use Aitchison s Logistic Normal distribution LN in conjunction with a Markov random eld approach to spatial modeling Besag Mardia to incorporate spatial dependence into the state distribution In addition methods to evaluate the e ect of covariates are incorporated in the model speci cation Markov chain Monte Carlo MCMC is used in a Bayesian setting for inference about the unobservable states and the state distribution parameters The method described above appears to be the rst spatially explicit statistical ap proach for modeling compositions arising from count data Mardia describes the Markov random eld approach for multivariate normal observation vectors He provides an example using logistic transformed compositions as the observation vector derived from landsat classi cation His emphasis is to illustrate the methodology and estimate the spatial dependence parameter There is no attempt to interpret the results in terms of the original compositions The LN distribution for compositional data was introduced by Aitchison and Shen who studied its properties and potential uses They also compare the LN class with the Dirichlet class of distributions for compositions Aitchison presents the LN as an analysis tool for compositional data and established many of its mathematical and statistical properties These results include the perturbation operation and its relevance to a central limit theorem for compositions For a comprehensive account of statistical issues and analysis methods for independent continuous compositions see Aitchison Several researchers have developed methods for spatially related compositions and cat egorical data For categorical spatial data Upton and Fingleton and Cerioli focus on the analysis of spatial contingency tables Observations occur on a regular spatial grid and are assumed to be size one realizations from a Multinomial distribution at each site The Multinomial parameter rk the k dimensional simplex is identical for all spatial locations The general approach is to modify the standard contingency table methodology for Chi square tests of independence and goodness of t to account for the spatial dependence between locations The main emphasis is on evaluating and correcting for the e ect of spatial dependence between locations Pawlowsky and Burger use the additive logratio transformation Aitchison and Shen to analyze spatially distributed continuous compositions They term such data regionalized compositions since the underlying random functions have a constant sum at each point in the sampling region The additive logratio transformation is used to transform the composition to unconstrained Euclidean space multivariate logit scale The spatial covariance structure of these transformed compositions is then modeled by co kriging The authors note that a di cult part of the analysis is that problems have to be formulated in terms of logratios and interpretation and description of spatial dependencies are also on the same scale Finally Besag et al describe a spatial logistic regression formulation for nino mial data The response is mortality from prostate cancer for men classi ed by age group time period and birth cohort date of birth Spatial aspects of cancer mortality are not modeled directly Instead pairwise di erence prior distributions are used to accommodate the temporal ordering of parameters That is age period and cohort e ects are assumed to be similar for parameters that are adjacent in time The prior distributions specify a spatial smoothing e ect for mortality probability for contiguous age period and birth cohort groups Perturbations and the Logistic Normal Distri bution Aitchison describes statistical analysis methods for compositional data with inde pendent observations These methods rely on the additive logratio transform alr to take observations from the k dimensional simplex rk to k dimensional Eu clidean space k Aitchison assumes that the transformed data can be adequately modeled by the k Multivariate Normal distribution This transformation and as sumption of Multivariate Normality de ne a distribution on rk the Additive Logistic Normal LN distribution A central idea behind the choice of the alr transform is a perturbation operator that can be used to model errors for compositional data Aitchison This model produces a structure for errors on rk that is more natural than the usual additive error model used in other areas of statistics Brie y an observed proportion vector z is modeled as a location vector perturbed by an error The i element of z is zi i i Pk j j j with i for all i k elements This section introduces the perturbation operator and shows how it leads to the LN distribution We rst describe the properties of the perturbation operator and its interpre tation for changes in compositions The relation to the LN distribution is also described The section concludes with a summary of other distributions for proportions and illustrates how the LN can be superior in tting biological data Intuition of the Perturbation Operator The perturbation operator can be considered an addition operator for proportions That is for z z rk z z z z r Aitchison section p describes a simple motivating example that is related here to introduce this operation Suppose at time t we have a unit of perishable substance consisting of components and in the proportions x x x Further suppose that component degrades as a rate of u unit time while components and degrade at rates u and u unit time respectively At time t the remaining relative amounts of the three components are x u c x u c x u c where c P i xiui In this example it is clear that each ui must be between zero and one However it is easy to see that in biological situations where one or more of the components are allowed to grow ui might be greater than one A property of this operator shown in the next section is that only the relative size of the components of u a ect the resulting composition So u can be normalized to sum to one without a ecting the result Thus we may consider x and u to be proportion vectors and their sum to also be a composition This operation is commonly used in statistics in Bayes rule Aitchison section p A prior distribution for a discrete parameter is perturbed by a vector of likelihood functions to form a posterior distribution In biological applications the perturbation operation is used by Edwards to describe relative gene frequencies after selection In addition Aebischer et al use this operation to model the active selection of available resources by animals Properties of Perturbations Here we de ne the composition and perturbation operations and describes their mathemat ical properties Much of the development follows Aitchison section pp The scalar multiplication and vector space properties are introduced here for the rst time Proofs of these properties are deferred to Appendix I To ease notation we use lower case to represent random variables as well as their corresponding values The distinction should be clear from the context When needed we follow the usual notation of using upper case to denote the random variable and lower case to denote a realization De nition Composition Operator C Aitchison p Suppose a is a k dimensional vector in positive Euclidean space k De ne C a by the following operation C a i ai Pk j aj where C a i denotes the i th element of the k vector i k Thus the composition operator normalizes a positive k vector to sum to one and C a r De nition Perturbation Operator Aitchison p Let u be a k part composition and a be a k vector with positive elements De ne the perturbation operator as follows u a C u a where denotes element wise multiplication Thus the composition u is mapped to a location in r by the perturbing vector a Aitchison pp shows several other simple properties of perturbations Property The operation a is a one to one transformation between r and r The inverse transformation is a where a a a ak Property The e ect of any perturbing vector a is the same as that for the composition C a u a u C a Therefore without loss of generality we need only consider perturbing vectors in rk We also present properties of the perturbation operator that are implied or left as exercises in Aitchison p Property For Ik k k k the operation Ik is the identity operator for any u r i e u Ik u Property The operation is commutative For u and a in rk

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parameter Estimation in Spatial Generalized Linear Mixed Models with Skew Gaussian Random Effects using Laplace Approximation

 Spatial generalized linear mixed models are used commonly for modelling non-Gaussian discrete spatial responses. We present an algorithm for parameter estimation of the models using Laplace approximation of likelihood function. In these models, the spatial correlation structure of data is carried out by random effects or latent variables. In most spatial analysis, it is assumed that rando...

متن کامل

Spatial modelling of zonality elements based on compositional nature of geochemical data using geostatistical approach: a case study of Baghqloom area, Iran

Due to the existence of a constant sum of constraints, the geochemical data is presented as the compositional data that has a closed number system. A closed number system is a dataset that includes several variables. The summation value of variables is constant, being equal to one. By calculating the correlation coefficient of a closed number system and comparing it with an open number system, ...

متن کامل

Spatial count models on the number of unhealthy days in Tehran

Spatial count data is usually found in most sciences such as environmental science, meteorology, geology and medicine. Spatial generalized linear models based on poisson (poisson-lognormal spatial model) and binomial (binomial-logitnormal spatial model) distributions are often used to analyze discrete count data in which spatial correlation is observed. The likelihood function of these models i...

متن کامل

Statistical Analysis and Interpretation of Discrete Compositional Data

A composition is a vector of proportions describing the contribution of each of k components to the whole. We introduce an algebra for compositions that provides a natural definition for additive statistical models. The algebra eases interpretation of treatment effects, treatment interactions, and covariates. Our developments extend the logistic normal modeling framework of Aitchison (1982, 198...

متن کامل

Spatial Varying Coefficient Regression Model For Relative Risk Factors of Esophageal Cancer Patients

In conventional methods for spatial survival data modeling, it is often assumed that the coefficients of explanatory variables in different regions have a constant effect on survival time. Usually, the spatial correlation of data through a random effect is also included in the model. But in many practical issues, the factors affecting survival time do not have the same effects in different regi...

متن کامل

Spatial Design for Knot Selection in Knot-Based Low-Rank Models

‎Analysis of large geostatistical data sets‎, ‎usually‎, ‎entail the expensive matrix computations‎. ‎This problem creates challenges in implementing statistical inferences of traditional Bayesian models‎. ‎In addition,researchers often face with multiple spatial data sets with complex spatial dependence structures that their analysis is difficult‎. ‎This is a problem for MCMC sampling algorith...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007